Sinusoidal Approach for the Single-Channel Speech Separation and Recognition Challenge

نویسندگان

  • Pejman Mowlaee Begzade Mahale
  • Rahim Saeidi
  • Zheng-Hua Tan
  • Mads Græsbøll Christensen
  • Tomi Kinnunen
  • Pasi Fränti
  • Søren Holdt Jensen
چکیده

Most of the single-channel speech separation (SCSS) systems use the short-time Fourier transform as their parametric features. Recent studies have shown that employing sinusoidal features for the SCSS application results in a high perceived speech quality. In this paper, we make a systematic study on automatic speech recognition results for a SCSS system that uses sinusoidal features composed of amplitude and frequency. We compare the speech recognition results with those already reported by other participants in the single-channel speech separation and recognition challenge. Our results show that a newly proposed system achieves an overall recognition accuracy of 52.3%, ranges at the median over all other participants in the challenge.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The 2nd ‘chime’ Speech Separation and Recognition Challenge: Approaches on Single-channel Source Separation and Model-driven Speech Enhancement

In this paper, we address the small vocabulary track (track 1) described in the CHiME 2 challenge dedicated to recognize utterances of a target speaker with small head movements. The utterances are recorded in a reverberant room acoustics corrupted with highly non-stationary noise sources. Such adverse noise scenario imposes a challenge to state-of-the-art automatic speech recognition systems. ...

متن کامل

Unconstrained Speech Separation by Composition of Longest Segments

A data-driven approach is presented for improving the performance of separating single-channel mixed speech signals, assuming unknown, arbitrary temporal dynamics. The new approach seeks and separates the longest mixed speech segments which can be accurately matched by composite training segments. Lengthening the mixed speech segments to match reduces the uncertainty of the matching constituent...

متن کامل

The ICSTM+TUM+UP Approach to the 3rd CHIME Challenge: Single-Channel LSTM Speech Enhancement with Multi-Channel Correlation Shaping Dereverberation and LSTM Language Models

This paper presents our contribution to the 3rd CHiME Speech Separation and Recognition Challenge. Our system uses Bidirectional Long Short-Term Memory (BLSTM) Recurrent Neural Networks (RNNs) for Single-channel Speech Enhancement (SSE). Networks are trained to predict clean speech as well as noise features from noisy speech features. In addition, the system applies two methods of dereverberati...

متن کامل

Separating Speech from Speech Noise

The main work at Columbia this year has been the development of algorithms for extracting and recognizing speech in nonstationary, noisy environments when only a single microphone channel is available. Our particular approach is based on using trained models to distinguish regions of time-frequency containing speech from nonspeech areas [2], and we have pursued this along several directions: On...

متن کامل

Monaural speech separation and recognition challenge

Robust speech recognition in everyday conditions requires the solution to a number of challenging problems, not least the ability to handle multiple sound sources. The specific case of speech recognition in the presence of a competing talker has been studied for several decades, resulting in a number of quite distinct algorithmic solutions whose focus ranges from modeling both target and compet...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011